Skip to content

Conversation

wks
Copy link
Collaborator

@wks wks commented Sep 1, 2025

We rewrite the function into a more linear style and remove duplicated code of handling blocking for GC.

Although we do not intend to change the logic of Space::acquire in this PR, we still fix some obvious bugs in the previous Err(_) => { ... } code path.

  • Previously the on_pending_allocation after failing to get new pages from PageResource did not include side metadata. Now it always includes side metadata.

We rewrite the function into a more linear style and remove duplicated
code of handling blocking for GC.

Although this PR does not intend to change the logic of Space::acquire,
it does fix some obvious bugs.

-   Previously if the current thread is a mutator, or GC is not enabled,
    the thread will attempt to poll and block for GC if it fails to get
    new pages from the PageResource.  Now it panicks.

-   Previously the `on_pending_allocation` after failing to get new
    pages from PageResource did not include side metadata.  Now it
    always includes side metadata.
@wks wks marked this pull request as ready for review October 9, 2025 07:40
@wks wks added the PR-extended-testing Run extended tests for the pull request label Oct 9, 2025
@wks wks requested a review from qinsoon October 9, 2025 07:41
@wks
Copy link
Collaborator Author

wks commented Oct 9, 2025

This PR is now ready for review. This PR is a preparatory step for #1382 which we recently proposed. This PR does not change the semantics of Space::acquire (except for obvious bugs), but should make the following decisions easier to handle, in terms of how they affect the control flow: (1) whether we can block for GC (depending on whether we are at safepoint), (2) whether we can over-commit, and (3) whether we are allowed to poll.

@wks
Copy link
Collaborator Author

wks commented Oct 9, 2025

The test case mock_test_allocate_without_initialize_collection failed and entered an infinite loop. When GC is not initialized, it should panic, but somehow failed to panic. I'll try to make fewer changes.

@wks wks mentioned this pull request Oct 9, 2025
2 tasks
@qinsoon
Copy link
Member

qinsoon commented Oct 9, 2025

I agree that removing duplicate code is a good idea, but I’m not convinced that refactoring acquire into a more 'linear' style improves readability.

The original logic was structured roughly like this:

if should_poll && poll {
	do gc
} else {
	allocate
	if success {
		mmap/zero/assert
	} else {
		poll
		do gc
	}
}

After refactoring, it becomes:

if should_poll {
	poll
}
if should_allocate {
	allocate
	if success {
		mmap/zero/assert
		return
	}
}
if failed_allocation {
	poll
}
do gc

In the original impl, the cascading if/else structure made the control flow and conditions explicit -- it’s immediately clear which path you’re in and why. The new 'linear' version spreads out the conditions, so when reading later parts of the function, you need to track all preceding conditions and be aware of early returns. This makes it harder to reason about the logic.

The 'do gc' part might be the only significant duplicated code here. It is reasonable to just extract it into a closure and reuse.

@wks
Copy link
Collaborator Author

wks commented Oct 10, 2025

The advantage of the new style is that the condition for polling, allocating, and doing GC are independent. It matches the new API proposed in #1400 (i.e. using boolean options) better. For example, the current control flow is

if should_poll && poll() {
    schedule and block for GC
} else {
    allocate
    if failed {
        poll
        schedule and block for GC
    }
}

If we allow both polling and over-committing, the control flow will become

if should_poll && poll() && !allow_overcommit {
    schedule and block for GC // reach here when disallowing overcommit
} else {
    allocate
    if failed {
        poll
        schedule and block for GC // reach here when allowing overcommit
    }
}

First of all, the !allow_overcommit part in the if statement is confusing. should_poll && poll() && !allow_overcommit does two things:

  1. Conditionally call poll(). It should call poll() if should_poll is true.
  2. Decide the condition of whether to attempt allocation, or go to the code path of "scheduling and blocking for GC". It should try allocating if GC is not triggered or over-committing is allowed.

I think it is bad to mix control flow (to call poll(), or not to call poll()) and condition computing (whether to allocate) in the condition part of the if statement. That's why I lifted polling into its own if statement: if should_poll { gc_triggerd = poll() }, and compute the condition of whether to attempt allocation separately. As I mentioned in the comment, we can change let should_try_to_allocate = !gc_triggered; to let should_try_to_allocate = !gc_triggered || allow_overcommit; later.

Secondly, the duplicated code blocks for schedule and block for GC becomes more confusing if we add over-committing into the play. Now if allow_overcommit is true, it will enter the second schedule and block for GC instead of the first. It is confusing because both code blocks handle the case that we can't allocate (whether we attempted allocation or not).

The new style emphasizes that the main thing Space::acquire does is allocating. That's why if should_allocate { allocate } is the centerpiece of this function. One advantage of this is that we guarantee the self.common().acquire_lock is always released after the if should_allocate { allocate } block.

Speaking of early returns, there are only two early returns in this function.

  • One is when the allocation is successful. I think this is a proper use of early return, because the rest part of this function handles allocation failure.
  • The other early return is when alloc_options.on_fail.allow_gc() is false. I am OK with converting this early return into if ... { return 0; } else { ... } return 0; or simply if allow_gc() { ...} return 0; But note that Use boolean allocation options #1400 adds an at_safepoint option, and it will be used as part of this condition, too.

I think if failed_allocation { poll } expresses its intent. But since #1400 also adds an allow_polling option, the condition will be different after that. And if we are not allowed to poll, we should not block for GC, either.

@wks
Copy link
Collaborator Author

wks commented Oct 10, 2025

Oops I realized that the code in the error-handling trailer. VM::VMCollection::block_for_gc should only be called if gc_triggered is true. But we haven't thought through the case where allow_polling == false && at_safepoint == true. Maybe allow_polling is not the right option in the first place, and it should be eager_polling which only controls the first poll but not the second poll (which is a force poll and must succeed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR-extended-testing Run extended tests for the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants